Excercise 1b: Visualizing AM Jobs Data

Prompt:

In the file am_jobdata.csv you will find information for the first job outcomes of college graduates from India including their salaries, test scores, their high school grades and other data. We'll try to make some interesting graphs in order to :

Help us understand what the data set is
Make some broad inferences about salary, college performance, spatial/gender distribution
Get some intuitive understanding of the salary data and how it relates to other variables

In any data analysis exercise, visualization is a good first step that can confirm assumptions or shine light on some promising directions for analysis.

See the Seaborn Gallery for ideas and directions on types of graphs that can be plotted.

Hints:

To get started, you may load the .csv file into Excel, Google Sheets or some other spreadsheet program to see what the data is and what its range is.
We'll be using seaborn to help us plot the data. Seaborn is a library that helps make visually appealing graphs and build on the matplotlib library. Head to the Seaborn tutorials page for some help on how to get started with plotting graphs in seaborn.
Think carefully about what type of graphs you want to plot - histograms, scatterplots, box-plots, violin plots, grouped data plots can all be useful.



In [113]:

    
# import libraries

import matplotlib
import IPython
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib as mpl
import pylab
import seaborn as sns

%matplotlib inline



In [114]:

    
# load the data
# hint : look up how to read .csv files in pandas
job_data = """ fill in something here!"""



In [27]:

    
# Let's see what the salaries look like!

mx = max(job_data.salary)
mn = min(job_data.salary)

print "Max is : "+str(mx)
print "Min is : "+str(mn)









    



Max is : 4000000.0
Min is : 35000.0



In [129]:

    
sb.distplot(job_data.salary.dropna(),bins = 100,rug=False)
sns.kdeplot(job_data.salary.dropna(),shade=True,color='blue')
# sns.rugplot(job_data.salary.dropna(),color='pink')









    Out[129]:





<matplotlib.axes._subplots.AxesSubplot at 0x1f3797b10>



In [128]:

    
# Lets plot males (gender == 1) vs females!
# Create data categories
m = job_data['gender']==1
f = job_data['gender']==0

sns.kdeplot(job_data[m].salary.dropna(),color='b')
sns.kdeplot(job_data[f].salary.dropna(),color='pink')









    Out[128]:





<matplotlib.axes._subplots.AxesSubplot at 0x1f349f310>